MG205: Econometrics Theory and Applications

Topic 6: Limitations of Linear Regression

José Ignacio González-Rojas

London School of Economics and Political Science

December 1, 2025

Today We Examine When OLS Fails

From Functional Forms to Model Limitations

Last week covered

  • Polynomial terms for non-linear relationships
  • Interaction terms when effects depend on context
  • Log transformations for elasticities
  • F-tests for multiple restrictions

Today’s focus

  • When does omitted variable bias matter?
  • Prediction vs causal inference
  • Reverse causality and simultaneous equations
  • Sample selection and non-random sampling

Understanding when OLS breaks down shapes how we interpret every regression

Exercise 1: Does a Commodities Trader Care About Omitted Variables?

A Trader Predicts Tomorrow’s Oil Price

A commodities trader builds a model to forecast tomorrow’s oil price

\[\widehat{\text{oil price}}_{t+1} = \hat{\beta}_0 + \hat{\beta}_1 \cdot \text{inventory}_{t}\]

She omits variables like geopolitical tensions, OPEC decisions, and dollar strength.

Should she worry about omitted variable bias?

The answer depends entirely on her objective

Two Models Serve Fundamentally Different Purposes

Forecasting vs Understanding Causation

Linear Projection (Prediction)

\[y = x'\beta + e\]

\[\mathbb{E}[x \cdot e] = 0\]

  • Finds best linear predictor
  • Orthogonality holds by construction
  • Goal: minimise forecast error
  • Include anything that helps predict \(y\)

Linear Regression (Causal)

\[y = x'\beta + e\]

\[\mathbb{E}[e | x] = 0\]

  • Estimates causal effects
  • Exogeneity is an assumption
  • Goal: understand how \(x\) affects \(y\)
  • Include only confounders

Orthogonality Is Weaker Than Exogeneity

The Mathematical Distinction

Projection guarantees unconditional orthogonality

\[\mathbb{E}[x \cdot e] = 0\]

  • Errors uncorrelated with \(x\) on average
  • Automatic from minimising squared prediction error

Causal regression requires conditional mean independence

\[\mathbb{E}[e | x] = 0\]

  • Error has zero mean at every value of \(x\)
  • This is an assumption about the world

Exogeneity \(\implies\) Orthogonality, but not the reverse

The Trader Cares About Accuracy, Not Causation

Why Omitted Variables Don’t Matter for Prediction

Her objective

  • Predict \(\text{oil price}_{t+1}\) accurately
  • Minimise \((y_{t+1} - \hat{y}_{t+1})^2\)
  • Make profitable trades

What projection delivers

  • Best linear forecast given information
  • Optimal combination of available data
  • Correlation is sufficient

What she does NOT need

  • Causal effect of inventory on price
  • Understanding of market mechanisms
  • Unbiased structural parameters

What happens when we omit OPEC decisions?

  • May reduce forecast accuracy
  • But doesn’t create “bias”
  • Her \(\hat{\beta}_1\) captures predictive relationship

Omitted variable bias is a causal concept—it has no meaning in pure prediction

Identification, Estimation, and Inference

Why the Distinction Matters for Our Trader

Goal Requires Causation? Model Needed
Predict oil price tomorrow No Projection
Understand what drives prices Yes Regression

The trader’s situation

  • She wants accurate forecasts, not causal mechanisms
  • Omitting OPEC decisions may reduce accuracy, but doesn’t create “bias”
  • Projection coefficients capture predictive relationships—that’s enough

For forecasting, causal identification is irrelevant

Exercise 2: An NGO Targets Its Campaign

Reaching the Maximum Number of People

The NGO’s Problem

An NGO wants to maximise reach for a health campaign. They model:

\[\widehat{\text{reach}}_i = \hat{\beta}_0 + \hat{\beta}_1 \cdot \text{social media budget}_i + \hat{\beta}_2 \cdot \text{demographics}_i\]

They omit variables like local trust in institutions, existing health infrastructure, and cultural factors.

Their objective

Allocate budget to locations where predicted reach is highest.

Prediction vs Evaluation Require Different Models

Same Data, Different Questions

Identify high-reach locations (prediction)

  • Include all predictive variables
  • Omitted variables reduce accuracy
  • No causal interpretation needed
  • Projection model sufficient

\[\mathbb{E}[x \cdot e] = 0 \; \checkmark\]

Measure campaign effectiveness(causal)

  • Must control for confounders
  • Omitted variables create bias
  • Need causal interpretation
  • Regression model required

\[\mathbb{E}[e | x] = 0 \; \text{(required)}\]

The same organisation may need both models for different decisions

When Would OVB Matter for the NGO?

Shifting from Prediction to Causation

Prediction question (correlation sufficient)

“Where should we allocate next year’s budget for maximum reach?”

  • Historical data predicts well
  • No causal mechanism needed
  • Omitted variables reduce fit, not create bias

Causal question (exogeneity required)

“Does increasing social media budget cause higher reach?”

  • Need to isolate budget effect
  • Omitting trust/infrastructure biases estimate
  • Policy conclusions require causal model

Always ask: “Am I trying to predict or to understand causation?”

Exercise 4: Training and Productivity

We Want to Change the World, Not Just Describe It

The Policy Motivation

The firm’s question

“If we increase training, will productivity rise?”

What we estimate

\[\text{productivity}_i = \beta_0 + \beta_1 \cdot \text{training}_i + e_i\]

What we find

\[\hat{\beta}_1 = 0.15 \quad (p < 0.01)\]

“Each hour of training associated with 0.15 unit productivity increase”

The temptation

  • Implement training programme
  • Expect 15% productivity gains
  • Justify investment to board

The problem

  • Association \(\neq\) causation
  • What if estimate is biased?

Firm Size Confounds the Relationship

The Causal Structure

Two correlations contaminate our estimate

  • Larger firms invest more in training (economies of scale in HR)
  • Larger firms are more productive (market power, better technology)

Our Estimate Captures More Than Training Effects

Decomposing What \(\hat{\beta}_1\) Measures

\[\mathbb{E}[\hat{\beta}_1] = \underbrace{\beta_1}_{\text{true training effect}} + \underbrace{\gamma_2 \cdot \delta_1}_{\text{firm size effect}}\]

where:

  • \(\gamma_2 > 0\): firm size increases productivity
  • \(\delta_1 > 0\): training correlates with firm size

\[\text{Bias} = (+) \times (+) = (+)\]

Our estimate \(\hat{\beta}_1\) overstates the true training effect

Why Correlation Fails for Policy

Unless we change firm size, training alone won’t change productivity

What our estimate tells us

Firms with more training have higher productivity.

What it does NOT tell us

Giving more training will increase productivity.

The reality

  • The relationship is driven by firm size
  • Smaller firms that increase training won’t see gains
  • The causal path is: Size → Productivity and Size → Training
  • There is no direct: Training → Productivity (or it’s much smaller)

An Alternative Story Reveals the Problem

Same Data, Different Mechanism

Story A (what we assumed)

  • Training develops skills → Skills increase output → Productivity rises

Story B (what’s actually happening)

  • Large firms have resources → Resources fund training programmes
  • Large firms have market power → Market power increases profits/productivity

Both stories generate

\[\text{cov}(\text{training}, \text{productivity}) > 0\]

Without controlling for firm size, we cannot distinguish these stories

Exercise 5: Does Beauty Cause Higher Wages?

The Reverse Causality Problem

Two Plausible Causal Directions

  1. Beauty → Wages
  • Taste-based discrimination by employers
  • Customer preferences in service industries
  • Productivity benefits (confidence, social skills)
  1. Wages → Beauty
  • Higher earners afford better skincare, cosmetics
  • Access to aesthetic procedures
  • Better health investments

Cross-Sectional Data Cannot Separate These Effects

Why Exogeneity Fails

We estimate

\[\text{wage}_i = \beta_0 + \beta_1 \cdot \text{beauty}_i + e_i\]

But the error contains

\[e_i = \gamma_1 \cdot \text{wage}_i + \text{other factors}\]

Therefore

\[\mathbb{E}[e_i | \text{beauty}_i] = \mathbb{E}[\gamma_1 \cdot \text{wage}_i | \text{beauty}_i] \neq 0\]

Beauty correlates with the error because wages affect both

A Solution: Exploit the Arrow of Time

The Future Cannot Cause the Present

The identification strategy

Compare:

  • Today’s beauty rating
  • Tomorrow’s wage change

\[\Delta\text{wage}_{i,t+1} = \beta_0 + \beta_1 \cdot \text{beauty}_{it} + e_{it}\]

Why this works

  • Future wage changes cannot affect current beauty
  • Removes reverse causality channel

What we identify

\[\beta_1 = \frac{\text{cov}(\text{beauty}_t, \Delta\text{wage}_{t+1})}{\text{var}(\text{beauty}_t)}\]

  • Effect of beauty on future wage growth
  • Temporal ordering establishes direction
  • More credible causal interpretation

Using time as a natural ordering helps establish causality

Exercise 6: Does Health Cause Higher Earnings?

The Same Simultaneity Problem

Defining the Variables Precisely

\[\text{earnings}_i = \beta_0 + \beta_1 \cdot \mathbb{1}[i \text{ is healthy}] + e_i\]

Two-Way Causation Creates Identification Failure

The Causal Structure

  1. Health → Earnings
    • Healthier workers more productive
    • Fewer absences, more energy
    • Better cognitive function
  1. Earnings → Health
    • Higher earners afford better healthcare
    • Less financial stress improves health
    • Better nutrition and living conditions

The Solution Mirrors Exercise 5

Exploit Temporal Ordering

Strategy

\[\Delta\text{earnings}_{i,t+1} = \beta_0 + \beta_1 \cdot \mathbb{1}[i \text{ is healthy at } t] + e_{it}\]

Interpretation

  • Current health predicts future earnings changes
  • Future earnings changes cannot cause current health
  • Temporal precedence establishes causal direction

The arrow of time provides identification when simultaneity threatens

Exercise 7: i.i.d. Sampling vs Exogeneity

AS2 and AS5 Serve Different Purposes

Both can fail independently

AS2: Random Sampling

\[\text{cov}(y_i, y_j) = 0 \text{ for } i \neq j\]

  • About data collection
  • Ensures sample represents population
  • Affects inference (standard errors)
  • Violated by:
    • Clusters
    • Time series

AS5: Exogeneity

\[\mathbb{E}[e_i | x_i] = 0\]

  • About model specification
  • Ensures no confounding
  • Affects identification (bias)
  • Violated by:
    • Omitted variables
    • Reverse causality

Crucial Terminology: Parameters, Estimators, Estimates

Getting the Language Right

Parameter (Estimand)

\[\beta_1\]

  • Population quantity
  • Fixed, unknown value
  • Identified or not

Estimator

\[\hat{\beta}_1 = \frac{\widehat{\text{cov}}(x,y)}{\widehat{\text{var}}(x)}\]

  • Function of data
  • Random variable
  • Biased or unbiased

Estimate

\[\hat{\beta}_1 = 0.073\]

  • Computed number
  • Single realisation
  • Random variable

We say “biased estimator”—never “biased parameter” or “biased estimate”

Which Assumptions Enable What?

Assumption Statement Ident. Estim. Infer.
1 Linearity: \(y = \beta_0 + \beta_1 x + e\)
2 Random sampling
3 Variation in \(x\): \(\text{var}(x) > 0\)
4 Zero mean: \(\mathbb{E}[e] = 0\)
5 Exogeneity: \(\mathbb{E}[e \mid x] = 0\)
6 Homoskedasticity: \(\text{var}(e \mid x) = \sigma^2\)
7 Normality: \(e \sim N(0, \sigma^2)\)

AS1-AS5: Are we estimating something meaningful? AS6-AS7: Is our uncertainty correct?

Exercise 8: Earnings and School Desk Assignment

The Alumni Meeting Problem

\[\text{earnings}_i = \beta_0 + \beta_1 \cdot \text{desk number}_i + e_i\]

Lower desk numbers (front of class) → higher earnings

\[\begin{align*} H_{0}: \beta_1 = 0 \\ H_{1}: \beta_1 < 0 \end{align*}\]

What seems fine

  • Desk assignment was random
  • No omitted variable bias
  • No reverse causality

The sampling method

Survey conducted at alumni meeting

Who attends alumni meetings?

  • Successful graduates
  • Those with high earnings
  • Want to showcase success

Sample Selection Along the Dependent Variable

Non-Random Sampling Creates Bias

\(\mathbb{P}[i\text{ attended meeting} | \text{earnings}_i] \text{ is increasing in earnings}\)

Why this biases our estimate

If \(\beta_1 < 0\) (front seats → higher earnings):

  • Front-seat alumni: High earnings → likely to attend
  • Back-seat alumni: Lower earnings → less likely to attend

The consequence

  • We oversample successful back-seat students (the exceptions) and undersample unsuccessful front-seat students.
  • This creates positive bias: \(\mathbb{E}[\hat{\beta}_1] > \beta_1\)

Random treatment assignment doesn’t help when sample selection depends on the outcome

Who Shows Up to the Meeting?

Population relationship

  • Front seats: Mix of high and low earners
  • Back seats: Mix of high and low earners
  • True slope: \(\beta_1 < 0\)

What we’d estimate with random sample

\[\hat{\beta}_1 \approx \beta_1 < 0\]

Meeting attendees (selected sample)

  • Front seats: Mostly high earners (typical)
  • Back seats: Only high earners (atypical)
  • Selected slope: Flatter (closer to zero)

What we estimate

\[\hat{\beta}_1 > \beta_1\]

Bias is positive, toward zero

Random treatment assignment doesn’t help when sample selection depends on the outcome

Summary

Six Lessons About LR Limitations

1. Projection requires \(\mathbb{E}[xe] = 0\), while regression requires \(\mathbb{E}[e|x] = 0\)

  • OVB irrelevant for forecasting

2. Correlation of \(z\) with both \(x\) and \(y\) creates confounding

  • Policy based on correlation will fail

3. Reverse causality violates AS5

  • Simultaneity:
    • beauty ↔︎ wages
    • health ↔︎ earnings
  • One solution: exploit temporal ordering

4. Non-random sampling violates AS2

  • Selection on \(y\) distorts relationship

5. Terminology matters

  • Parameters: identified or not
  • Estimators: biased or unbiased
  • Estimates: neither

6. Random sampling affects inference, while exogeneity affects identification

  • Both can fail independently

Next Class: Exercises 9-14

Measurement Error, Heteroskedasticity, and Applications

What we’ll cover

  • Measurement error: Bias when \(x\) measured with noise
  • Heteroskedasticity: Non-constant error variance

Why it matters

  • Real data always contains measurement error
  • Heteroskedasticity ubiquitous in cross-sections
  • Robust standard errors as practical solution
  • Distinguishing bias problems from inference problems

Understanding which assumption fails guides the solution